Winter 2026
Two steps
The antidote to a black box.
We will use a unified integration notation so that if random variable \(\theta \in \Theta\) has cumulative distribution function \(F\), then for some measurable set \(A \subseteq \Theta\),
\[ \mathbb{P}_F(A) = \int_A F(\mbox{d}\theta) \] which equals \[ \mathbb{P}_F(A) = \int_A f(\theta) \, \mbox{d}\theta \]
if \(\theta\) is continuous with density \(f\), and
\[ \mathbb{P}_F(A) = \sum_{\theta \in A} f(\theta) \]
if \(\theta\) is discrete with mass function \(f\).
With this notation, expectations can be written as follows. For some function \(h(\theta)\),
\[ \mathbb{E}[h(\theta)] = \int_\Theta h(\theta) \, F(\mbox{d}\theta) \] which equals \[ \mathbb{E}[h(\theta)] = \int_\Theta h(\theta) \, f(\theta) \, \mbox{d}\theta \]
if \(\theta\) is continuous with density \(f\), and
\[ \mathbb{E}[h(\theta)] = \sum_{\theta \in \Theta} h(\theta) \, f(\theta) \]
if \(\theta\) is discrete with mass function \(f\).
Theorem
Let \(\{ \mathcal{S}, \mathcal{B}(\mathcal{S}), \mathbb{P} \}\) be a probability space.
Let \(\{ A_i : i = 1, 2, \ldots \}\) with \(A_i \in \mathcal{B}(\mathcal{S})\) be a partition of \(\mathcal{S}\), and let \(B \in \mathcal{B}(\mathcal{S})\) with \(\mathbb{P}(B) > 0\).
Then,
\[ \mathbb{P}(A_j \mid B) = \frac{\mathbb{P}(B \mid A_j) \, \mathbb{P}(A_j)}{ \sum_i \mathbb{P}(B \mid A_i) \, \mathbb{P}(A_i)} \, .\]
Bayes’ theorem can be used to coherently incorporate evidence to update beliefs.
Two paths to the same conclusion:
Example
I have two coins
I randomly select one of the two coins and flip it. Heads. Which coin was selected?
I flip the same coin again. Heads. Which coin was selected?
Flip it again. Heads. Which coin was selected?
Two-headed coin example
Before first flip:
After first flip:
After second flip:
After third flip:
Theorem
Let \(f(x \mid \theta)\) denote the joint density (or mass) function of “data” \(x\), conditional on “parameter” \(\theta \in \Theta\) (discrete or continuous) having “prior” distribution \(\Pi\).
Then for any \(A \in \mathcal{B}(\Theta)\),
\[ \mathbb{P}(\theta \in A \mid x) = \frac{ \int_A f(x \mid \theta) \, \Pi(\mbox{d}\theta)}{ \int_\Theta f(x \mid \theta) \, \Pi(\mbox{d}\theta)} \, \] defines a posterior distribution of \(\theta\).
Note that \(f(x \mid \theta)\) and \(\Pi\) together define a joint distribution over \(x\) and \(\theta\).
Example
Bayes’ original thought experiment: An Essay Towards Solving a Problem in the Doctrine of Chances, published in 1763.
Example (Bayes’ experiment)
Example (Bayes’ experiment)
Example (Bayes’ experiment)
Example (Bayes’ experiment)
Example (Bayes’ experiment)
Propagation of uncertainty
When using a single model, the posterior distribution contains all necessary information for inferences on the unobserved variables.
This eliminates the need to estimate components in sequence.
Examples
Definition
Let \(\theta\) be an unknown state from a space of possible states \(\Theta\),
and let \(a\) be an available action from set \(A\).
The function \(L(\theta, a) \in \mathbb{R}\) with \(-\infty < L(\theta, a)\) is called a loss function.
Equivalently, we can call \(U(\theta, a) := -L(\theta, a)\) a utility function.
Example
Suppose you are on a first date, and you are interested in continuing to date this person.
The “state” is this person’s reciprocal interest, which is unknown to you.
Let \(\theta \in [0,1]\) with \(0\) being total disinterest.
The “actions” available to you are
\(a_1=\) Give up and move on,
\(a_2=\) Let your date make the next move, and
\(a_3=\) Invite them on a second date.
Create a two-way table defining a valid loss (or utility) function for each state (discretized or step function) and action pair.
Definition
Given a loss function \(L(\theta, a) \in \mathbb{R}\) and probability distribution \(\Pi\) on \(\theta\), define the expected loss for any action as
\[\mathbb{E}_\Pi[L(\theta, a)] := \int_\Theta L(\theta, a) \, \Pi(\mbox{d}\theta) \, .\]
Dating example
Suppose our belief about \(\theta\) can be expressed with a beta distribution.
Calculate \(\mathbb{E}_\Pi[L(\theta, a)]\) under each possible action.
Definition
A decision rule maps available information in data \(x\) to an action.
Frequentist risk averages the loss with respect to the distribution of \(x\) instead of \(\theta\).
Bayes risk averages over both.
Decisions can then be made by choosing the action or rule that:
See Statistical Decision Theory and Bayesian Analysis by James Berger.